The Weighted Factors Automaton : A Tool for DNA Sequences Analysis

نویسندگان

  • Christiane Hespel
  • Farida Benmakrouha
  • Danielle Quichaud
چکیده

A lot of computing tools are often used for analyzing DNA sequences like trees, automata, dictionaries, every one being reserved for a particular problem. A. Blumer and al. have proposed a more general computing tool : the smaller automaton recognizing the subwords of a text (DAWG). In this paper we propose the concept of “weighted factors automaton” producing every occurrence of any factor. Its transitions are labeled by the read letter and weighted by the set of the indices of the factors beginnings. The factors are obtained by concatenating the read letters and the indices of the factors beginnings are obtained by computing the intersection of the weighting sets, when advancing from the initial state to a final state. We think that this automaton can be more easily processed than DAWG and we present a comparison between DAWG and our automaton: the set of the factors beginnings indices and the factors frequency are more easily obtained by our automaton and the restriction of our automaton to the factors of length ≤ k maintains the automaton structure, when DAWG cannot be easily restricted. The applications are numerous: By selecting factors of length 1, we obtain the coding regions, factors of length 3, we obtain the expression level of some gene. The “weighted factors automaton” allows us to find matches of pattern, to study homology, FASTA and BLAST algorithms being significantly simplified.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Molecular and Bioinformatics Analysis of Allelic Diversity in IGFBP2 Gene Promoter in Indigenous Makuee and Lori-Bakhtiari Sheep Breeds

The aim of this study was to perform molecular and bioinformatics analysis of IGFBP2 gene promoter in association with some economic traits in indigenous Makuee (MS) and Lori-Bakhtiari (LB) breeds. DNA was extracted from blood samples of 120 MS and 200 LB and a 297 bp fragment from the upstream sequences of studied gene was amplified and genotyped by single-strand conformational polymo...

متن کامل

Research Article: Molecular genetic divergence of five genera of cypriniform fish in Iran assessed by DNA barcoding

The present study represents a comprehensive molecular assessment of some family of freshwater fishes in Iran. We analyzed cytochrome oxidase I (COI) sequences for five genus of cypriniform fishes from Iran. The present investigation provides data on genetic structure of some species of Nemachilidae including Paraschistura bampurensis, Oxynoemacheilus kiabii and Turcinemacheilus saadii and Leuc...

متن کامل

Simple Sequence Repeats Amplification: a Tool to Survey the Genetic Background of Olive Oils

A reliable DNA extraction method for use on extra virgin olive oil based on a commercial kit was defined, and the possibility of using this DNA for fingerprinting the original cultivar was demonstrated. The genetic traceability of single-cultivar virgin olive oil from two cultivars (Carolea and Frantoio) was achieved by identifying the varieties from which they were produced. This involved the ...

متن کامل

A comparative phylogenetic analysis of Theileria spp. by using two two "18S ribosomal RNA" and "Theileria annulata merozoite surface antigen" gene sequences

More than 185 species, strains and unclassified Theileria parasites are categorized in the Entrez Taxonomy. The accurate diagnosis and proper identification of the causative agents are important for understanding the epidemiology, prevention and appropriate treatment. This study aims to discuss the importance of two genes of Theileria annulata 18S ribosomal RNA (18S rRNA) and Theileria annulata...

متن کامل

Genetic analysis of polyketide synthase and peptide synthase genes of ‎cyanobacteria as a mining tool for new pharmaceutical compounds

Cyanobacteria are considered a promising source for new ‎pharmaceutical lead compounds and a large number of chemically diverse and ‎bioactive metabolites have been obtained from cyanobacteria. Despite of ‎several worldwide studies on prevalence of NRPSs and PKSs among the ‎cyanobacteria, none of them included Iranian cyanobacteria of Kermanshah ‎province. Therefore, the aim of this study was t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013